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Abstract 



Sampling errors limit the accuracy with which forms can be linked. Limitations on accuracy are 
especially important in testing programs in which a very large number of forms are employed. 
Standard inequalities in mathematical statistics may be used to establish lower bounds on the 
achievable inking accuracy. To illustrate results, a variety of equating problems are considered. 
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In practice, the accuracy of equating or linking is limited because estimates used in the process 
are based on samples. Limitations on accuracy are encountered even under ideal conditions. 
These limitations can be computed by use of classical statistical inequalities. Bounds on accuracy 
involve the number of examinees involved in the equating process and the number of forms that 
must be linked. The limits also involve what assumptions are made concerning forms which 
contain common items. In typical cases, the most important issue in practice is that the number 
of examinees available per unit time is affected to only a very limited extent by an increase in the 
number of forms used within that time interval. Thus more administrations typically leads to 
fewer examinees per administration. If security considerations limit the number of administrations 
in which a form can be used, then it necessarily follows that more administrations per unit time 
results in more forms per unit time, and information concerning each form is based on fewer 
examinees. Two problems arise simultaneously. Because information concerning a form involves 
fewer examinees, estimates of form characteristics related to the difficulty of the form become less 
accurate. In addition, equating and linking involve comparison of different forms. As the number 
of forms becomes increasingly large due to security constraints, comparisons between different 
forms must in some cases become increasingly indirect. Two forms to be compared will not have 
been used together and will share no common items. Even more indirection is involved. Consider 
the following hypothetical case. A form used on September 1, 2009, may share common items 
with a form used on January 8, 2009, and with a form used on October 22, 2008. A form used 
on September 8, 2009, may share common items with a form used on February 22, 2009, and 
with a form used on July 21, 2008. Thus the September 1, 2009 form can only be linked to the 
September 8, 2009 form through whatever links are available for the forms used on July 21, 2008, 
October 22, 2008, January 8, 2009, and February 22, 2009. It may well be the case that none of 
these forms share any common items, so that further steps are needed to provide linkage. These 
many steps required to link the two forms used one week apart result in increased equating error 
due to sampling effects. 

In practice, as suggested by some of the results in this report, the standard error associated 
with equating typically will increase at least in proportion to the square root of the number of 
forms used in the time interval. When restrictions are placed on the number of times a form 
can be used, the standard error associated with equating typically will increase in proportion to 
the number of forms used in the time interval. This point is illustrated for mean equating in 
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Section 1.2. Thus a testing program with quite satisfactory equating accuracy with six forms per 
year may have quite unsatisfactory equating accuracy with 60 forms per year. 

To illustrate the issues involved, it is helpful to look at some simple examples of equating 
procedures. In Section 1, some cases of mean equating are explored. In Section 3, linear equating 
is explored. In this section, application of results of linear equating are also discussed in terms 
of implications for equating by item response theory. Section 4 considers some consequences 
of the analysis in this report. A basic knowledge of equating methods is assumed (Kolen & 
Brennan, 2004; von Davier, Holland, & Thayer, 2004); however, most of the analysis relies on 
basic statistical theory. The examples are deliberately chosen to be relatively simple, so that the 
basic issues can be discussed. 



1 Mean Equating 

Mean equating is a very simple equating procedure in which a constant is added to the raw 
score to adjust for differences in form difficulty. The approach is most appropriate for observed 
scores that are normally distributed and have the same variance; however, it can be used more 
generally. For an initial example, consider the following equating sequence in which randomly 
equivalent groups of examinees are employed at each administration to link test forms. This 
example involves a case in which two test forms are used at each administration and a given test 
form is never used for more than two administrations. Let T > 2 administrations be considered 
for N examinees. For example, one might have 24 administrations over a period of two years, with 
one administration per month, and there might be 240,000 examinees over the two-year period. 
For simplicity, let M = N/(2T) be an integer. In Administration t, 1 <t<T, let N/T examinees 
be assigned at random to two groups of M examinees. Thus in the hypothetical example, 5,000 
examinees are in each of the two randomly equivalent groups at each administration. A total of 
U = T + 1 distinct forms numbered from 1 to U are used for the T administrations. Thus in the 
hypothetical example, U = 25 forms need to be linked. At Administration t, the first randomly 
equivalent group of examinees, Group 1, receives Form t, and the other group of examinees, 
Group 2, receives Form t + 1. In this design, Form t and Form t + 1 can be directly compared, for 
they are used on equivalent groups of examinees. 

With some assumptions, it is possible to compare Forms t and u, 1 < t < u < T, even 
when they are never employed in the same administration. To do so, let the observed score of 
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Examinee i, 1 < i < M, in Group k at Administration t be X^ t . Note that it is assumed for 
simplicity that no examinee receives more than one form at an administration and no examinee 
appears in more than one administration. Thus Examinee i of Group k in Administration t has no 
relationship to Examinee j of Group m at Administration u unless t = u, k = m, and i = j. Let 
the observed score X^t have mean fik t and variance a 2 . Thus the mean of the score X^t depends 
on the Group k of the examinee and the Administration t; however, the variance of the score 
is independent of both group and administration. This assumption, which simplifies analysis, 
is appropriate for mean equating. Consistent with the assumption that examinees for different 
groups and administrations are distinct individuals, assume that the X^ are all independently 
distributed. To simplify discussion of the impact of equating error, assume that the reliability 
of each form is the same for each administration. Thus the reliability of Form t or t + 1 at 
Administration t is p 2 , where 0 < p 2 < 1. In mean equating, dt = m t — provides a measure 
of the difficulty of Form t + 1 relative to Form t. This measure is based on the distributions of 
raw scores at Administration t for Groups 1 and 2. The fundamental assumption to make for 
comparisons of forms not used in the same administration is that the difference in difficulty of two 
forms would be the same were the forms used for other administrations. Thus one may let D\ = 0 
and D u+ \ = D u + d u for u > 1. Then D u is a measure of the relative difficulty of Form u compared 
to Form 1. If Form 1 is the base form used in equating or linking, then a raw score of x on Form u 
would be converted to an equated raw score of x + D u on Form 1 if D u were known. This result 
can be obtained in stages. A score of x on Form 2 is converted to a score of x + D 2 = x + d\ on 
Form 1. Note that d\ is greater than 0 if the mean score on Form 2 at Administration 1 is lower 
relative to the mean score on Form 1 at Administration 1 (Form 2 is more difficult than Form 1 
at Administration 1). A score of x on Form 3 is converted to a score of x + (Z 2 on Form 2 based 
on comparison of Form 2 and Form 3 at Administration 2. In turn, the score of x on Form 3 is 
converted to a score of x + cfe + dl = x + D 3 on Form 1. In this manner, a score of x on Form u 
is eventually converted to a score of x + D u on Form 1. In the hypothetical example, in the first 
year, the score x from Form 3, which was used in Group 2 of the February administration in 
the first year, is thus converted to a score x + D 3 on Form 1, which was used in Group 1 of the 
January administration in the first year of testing. This conversion is also obtained with a series 
of equipercentile equatings if the X^t are all normally distributed, the distributions are known, 
and chained equating is used. 
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In practice, the means required for the conversion of scores are not known and must be 
estimated. For each Group k from each Administration t, the mean p, k t of the scores Xikt for 
Group k and Administration t can be estimated by the sample mean 

M 

X kt = M- 1 Y J Xi kt . 

i=l 

Thus the difference dt , t > 1, in the difficulty of Forms t + 1 and t is estimated, based on 
Administration t, to be the difference 

d t = X lt - X 2t 

between the sample means for Group 1 , which received Form t, and Group 2, which received 
Form t + 1. The estimate dt is unbiased, so that the expectation of dt is dt , and the variance of 
dt = 2 a 2 /M. In turn, the conversion size D u for conversion from Form u to Form 1 is estimated 
for u > 1 by 

U — 1 

A, = £4 

t .= 1 

The estimate D u is unbiased, so it has expectation D u , and the variance of D u is 
( u — 1)[2ct 2 /M] = 4 (u — 1 )Ta 2 /N. The standard error of D u is then 2[{u — 1 )T/N] 1 / 2 cr. 

For comparison, the variance of measurement of the score Xt k t of Examinee i in Group k of 
Administration t is a 2 (l — p 2 ), so that the ratio of the variance of equating error to the variance 
of measurement is 

G u = 4 (u - 1)T/[N(1 - p 2 )\ 

in the case of Form u. This ratio increases as the form count u and number of forms U = T + 1 
increase, as the reliability p 2 increases, and as the total sample size N decreases. The highest 
ratio is found for the last form used, for here u = U = T + 1. In the hypothetical example of 
24 administrations over two years for 240,000 examinees, suppose that a = 10 and p 2, = 0.9. In 
this case, the variance of measurement is 10, and the estimate D u has variance ( u — l)/25. This 
variance is only 0.04 for u = 2, and G\ = 0.004 is a rather small ratio of variance of equating 
compared to variance of measurement. Nonetheless, for u = U = 25, the variance of Du of 0.96 is 
not negligible, and Gu = 0.096 is large enough that equating error has some effect on the effective 
reliability of the test. Note that this example involves a substantial number of examinees for a 
testing program and a number of administrations over two years that is not exceptional. 
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Some further examples may help provide some perspective. If the total number of examinees 
is N = 200, 000, the common reliability coefficient is p 2 = 0.9, a total of T = 10 administrations 
are used, and equating error is considered for the last form, so that u = U = 11, then Gjj = 0.02 
is relatively small, so that equating error is relatively small compared to measurement error for a 
form. This example applies to a testing program with many examinees and a moderate number of 
administrations . 

If the number of examinees is reduced to IV = 2,000 but the reliability p 2 remains 0.9, the 
total number T of administrations remains 10, and the form considered remains Form U, then 
Gjj = 2 is very large. If the standard deviation of the scores is a = 10, then the variance of 
measurement of 10 is half the variance 20 of equating. This example applies to a rather small 
testing program with very small administration sizes of 200 examinees. 

If N = 200,000 receive the test at some time, the reliability coefficient is still p 2 = 0.9, the 
form number is U = 51, and the number of administrations is T = 50, then the ratio Gu = 0.10 is 
not negligible. The variance of equating error is a tenth of the variance of measurement. Here the 
number of examinees is fairly large, but the number of administrations is also large. 

Assessment of the impact of equating error depends on whether the examinee is regarded as 
taking a random examination or not. For an examinee who uses Form u at Administration u, the 
effective variance of measurement is cr 2 (l + G u ), the sum of the variance of measurement and the 
variance of equating. A slight change in the formula occurs for Form u + 1 and Administration 
u because the examinee is part of the data used for estimation of D u + 1 - The effective variance 
of measurement is then cr 2 (l — M~ l + G u + 1 ). If the examinee is regarded as taking a fixed 
examination, then the equating error D u — D u has approximate probability 0.05 of benefitting or 
harming the examinee by more than 1.96 <t[( 1 — p 2 )G u ] 1//2 . This probability is exact if all score 
distributions are normal. This criteria is somewhat stricter. Relative to the standard error of 
measurement a(l — p 2 ) 1 ^ 2 , one has 

1.96<t[(1 - p 2 )G u ] 1 / 2 /[a( 1 - p 2 ) 1 ' 2 } = 1.96 G l J 2 . 

Consider the previous examples. In the example with 240,000 examinees, 24 administrations, 
a standard deviation of scores of a = 10, and a reliability coefficient of 0.9, by the last form used 
( U = 25), the effective variance of measurement, 10.96, is appreciably greater than the actual 
variance of measurement of 10. By the perspective of a fixed administration, the probability is 
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0.05 that the equating error is at least 1.88 and the standard error of measurement is 10 1 / 2 = 3.16. 
By this criterion, there is a substantial possibility of a substantial impact of equating error on the 
reported score. 

In the case of N = 200, 000 examinees, a reliability coefficient of p 2 = 0.9, a standard 

deviation a = 10 of scores, Form U = 11, and Administration T = 10, the effective variance of 

measurement of 10.2 is not much more than the variance of measurement of 10. On the other 

hand, the probability is 0.05 that the equating error changes the score reported by at least 

1.96 Gjj = 0.28, an amount not negligible relative to a standard deviation of measurement of 3.16. 

In the case of a small number N = 2,000 of examinees, a reliability coefficient p 2 = 0.9, a 

standard deviation of scores of a = 10, Form U = 11, and Administration T = 10, the effective 

variance of measurement of 30.0 is very large relative to the variance of measurement of 10, and 

1 /2 

the probability is 0.05 that equating error changes the reported score by at least 1.96Gy = 2.77, 
a very large change relative to 3.16, the standard error of measurement. 

For an example with many examinees and many administrations, let N = 200, 000 be the 
number of examinees, let p 2 =0.9 be the reliability coefficient, let the form number be U = 51, 
and let the administration number be T = 50. In this case, the effective variance of measurement 
is 11.0 is substantially greater than the variance of measurement of 10, and the probability is 0.05 
that the equating error changes a score by at least 1.96GV = 0.62, a fairly large value relative to 
the standard error of measurement of 3.16. 

These computations illustrate a basic issue in terms of assessment design. As long as the 
total number of examinees under study does not vary, increasing the number of administrations 
dramatically increases the variability of results. The average variance of equating over all 
administrations is 

T 

(2 T)- 1 Y^[4tTa 2 /N + 4 (t - l)Ta 2 /N } = 2 T 2 a 2 /N. 

t .= l 

For a fixed total number N of examinees, doubling the number of administrations quadruples 
the average variance of equating and doubles the root mean squared equating error for the N 
examinees. 

1.1 Many Parallel Forms at Each Administration 

Modification of the method of data collection yields substantially different results. Suppose 
that the same U > 2 forms are used at each administration from 1 to T, and let the total number 
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of examinees N be a multiple of TU. At Administration t, let examinees be divided randomly 
into K = U groups of equal size, and let each form be given to the M = N/(TU ) examinees in 
Group u. Let the score of examinee i, 1 < i < M, from Group u at Administration t be Xj M t . As 
in Section 1, let Xj u i have mean fj, u t and variance a 2 . Assume that the Xi U t are all independently 
distributed, and assume that the reliability of Form u at Administration t is p 2 , where 0 < p 2 < 1. 
Assume that \x u t satisfies an additive model fi u t = pit — D u . In this case, D u measures the 
difficulty of Form u relative to Form 1. The assumption is made that the relative difficulty of 
Form u compared to Form 1 is the same for all administrations. If Form 1 is the base form, then 
a score x on Form 11 / 1 is equated to a score x + D u on Form 1, provided that D u is known. 

In practice, the relative form difficulty D u of Form u must be estimated for u > 1. For efficient 
estimation, let equating at Administration t be based on all data available from Administration t 
and from any prior administrations. Note that this procedure, although statistically appropriate, 
does lead to a situation in which two examinees with identical raw scores can have different 
reported scores if they take the same form at different administrations. For each Administration h, 
1 < h < t, the form difficulty D u has an unbiased estimate X\ h ~ X u h, where the sample mean 

M 

X u h = ' ^2 Xiuh 

1=1 

estimates the population mean fi u h for any Form u and Administration h. The estimates 
X\h — X u h, 1 < h < t, are independent and have common variance variance 2 a 2 /M. Thus the 
estimate of D u at Administration t is obtained by averaging the estimates of form difficulty from 
the first t administrations. The resulting estimate is 

t 

Du-t = -t~ 1 J2(Xuh-X lh ). 
h= 1 

This estimate is unbiased, so that the expectation of D u .t is D u , and the variance of D u .t is 
2 a 2 /(tM) = 2TU a 2 / (tN) . For comparison, the variance of measurement of X^t is <7 2 (1 — p 2 ), so 
that the ratio of the equating error to the variance of measurement is 

G u . t = 2TU/[tN(l - p 2 )] 

for each Form u. This ratio decreases as the Administration t increases and as the total sample 
size N increases, but the ratio increases as the number T of administrations, the number U of 
forms, and the reliability increase. The ratio G u .t = 2f7/[IV(l — p 2 )]. 
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For comparison with the equating design presented at the start of Section 1, consider an 
example with N = 200, 000 examinees, a reliability coefficient p 2 = 0.9, U = 10 forms, T = 10 
administrations, and Administration t = 10. Then, for each Form u > 1, G u .t = 0.001 is quite 
small. Note how much smaller G u .t is than the value Gjj = 0.02 in Section 1 achieved for 
Form U = 11 for N = 200,000 examinees and T = 10 administrations. Even for the much less 
favorable case of the initial administration, G u - 1 is 0.01 for each Form u > 1. Thus the use of 
many parallel forms permits much more accurate estimation of the conversion constants D u than 
was the case in the design in Section 1. Nonetheless, much smaller sample sizes are much less 
satisfactory. Consider the case of N = 2,000 examinees. Let the reliability coefficient remain 
p 2 = 0.9, let the number T of administrations and the number U of forms both remain 10. In this 
case, only 20 examinees in an administration receive the same form. Not surprisingly, G u .\ is 1, so 
that the variance of the equating conversion is as large as the variance of measurement. By the 
last administration, G u .t is 0.1, a figure which is not negligible but obviously much better than for 
the first administration. For a case with many administrations but a moderate number of forms, 
consider N = 200, 000 examinees, a reliability coefficient p 2 = 0.9, T = 50 administrations, and 
U = 10 forms. By the final administration, G u .t = 0.001 is the same as in the previous example 
with 200,000 examinees. On the other hand, the situation for the initial administration is rather 
less satisfactory, for G u .\ is then 0.05. Thus the variance of the examinee’s reported score due to 
equating is 0.05 times as great as the variance of measurement. 

In practice, despite the favorable results, the design with many parallel forms used in 
each administration can be difficult to apply, both due to limitations in the ability of testing 
programs to administer a large number of forms in the same administration and due to security 
considerations. Even if a very large number of forms can be regarded as only a limited security 
risk due to the difficulty of determining in advance the answers for the very large number of items 
in all the forms, there remains the problem of starting out. For initial administrations, equating 
accuracy can be quite limited. Some delay in initial reporting until more administrations are 
completed can alleviate the problem, but pressure to report scores promptly may render this 
design impractical. As a consequence, it is appropriate to consider other alternatives. 




1.2 Design Limits 

Somewhat more complex equating designs based on randomly equivalent groups may be 
developed. These designs do not result in improvements in results when the design in Section 1.1 
can actually be used, but the designs can be employed to indicate inherent limitations in equating 
accuracy once security considerations and reporting deadlines restrict form reuse and restrict the 
ability to delay reporting until data are available from more administrations. The fundamental 
issue is that under a restriction that no form can appear in more than a specified number of 
administrations, as the same number of examinees is divided into more administrations, the 
average equating error, as measured in mean squared error, across administrations becomes 
proportional to at least the square of the number of administrations. In terms of root mean 
squared error, this measure of accuracy is at least proportional to the number of administrations, 
so that multiplying the number of administrations by 10 decreases accuracy by the criterion of 
root mean squared error by a factor of 10. 

To discuss equating designs for randomly equivalent groups, the following design is introduced 
to generalize the equating designs of Sections 1 and 1.1. Consider T > 2 administrations, N 
examinees, and U > 2 forms. At each Administration t. there are K > 2 equivalent groups 
k, 1 < k < K, of M = N/(KT ) > 1 examinees, and Group k is administered Form u^t- For 
simplicity, assume that u\\ = 1, so that Form 1, the base form, is administered in Administration 1 
to the M examinees in Group 1. Note the implicit assumption that N is an integer multiple of 
KT. It is still assumed that the raw score X^t of Examinee i from Group k at Administration t 
is a random variable with mean p^t and variance a 2 > 0, and it is assumed that the reliability 
coefficient is p 2 for each combination of form and administration. The examinee scores X^t are 
still assumed to be mutually independent. The assumption on the mean p^t of the scores for 
Group k of Administration t is that pkt is additive in administration and form, so that 

k'kt = &t — 

for some real number constants at, 1 <t<T, and some real constants D u , 1 < u < U. To identify 
parameters, it is assumed that D\ = 0, so that a\\ is the expectation of the score of examinees at 
Administration 1 who receive the base form, Form 1. 

This additive model is consistent with additive models previous employed in Section 1 and 1.1. 
In Section 1, the number of groups in an administration is K = 2, the number of administrations 
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is T, and the number of forms is U = T + 1. At Administration t, 1 < t < T,the administered 
forms are uu = t and U 2 t = t+ 1. The parameter difference d u = D u+ i — D u = fj, uu — H(u+i)u 
for Form u, 1 < tt < T, so that is 0 and D u +i = D u + d u for 1 < u < T. It follows that 
a t = Htt + D t for 1 < t < T. 

In Section 1.1, K = U groups are used in each administration, and Ukt = k for 1 < k < U 
and 1 < t < T, so that Group u, 1 < u < U, is administered Form u at Administration t. Here 
D u = fin — Hut for each Form u and Administration t, and at = Hu for 1 < t < T is the score 
mean for the examinees in Group 1 who received the base form, Form 1, at Administration t. 

In general, the basic feature of the additive model is that the difference 

Hkt. Hk't = D U y t — Dubt 

in means is a function of the Forms u^t and u^'t administered at Administration t to Groups k 
and k! . The difference has no further dependence upon the Administration t. The difference 
fit = at — a\ provides a measure of the relative proficiency of examinees at Administration t 
compared to examinees at Administration 1. This proficiency difference is assumed independent 
of the Form u. The parameter D u provides a measure of the difficulty of Form u relative to the 
difficulty of Form 1. This parameter is assumed not to depend on the administration. 

As in sections 1 and 1.1, if the parameter D u is known for Form u > 1, then a score x on 
Form u is converted to a score x + D u on Form 1 . To estimate the parameters D u for u > 1 , least 
squares may be applied to obtain least-squares estimate D u of D u for 1 < u < U. The constraint 
is imposed that D\ = D\ = 0. Given the estimate D u for a Form u > 1, a raw score of x on 
Form u can be equated to a raw score of x + D u on Form u. Computation of the least-squares 
estimates of the D u is a familiar task from the study of two-way analysis of variance with unequal 
numbers of observations in cells (Scheffe, 1959, p. 114), although conditions are required to ensure 
that all the parameters D u , 2 < u < U , are estimable. 

To obtain least-squares estimates, a three-dimensional array rriktu , 1 < k < K, 1 < u < U, 

1 < t < T, is used to specify the relationship between groups, administrations, and forms. For 
1 < k < K, 1 < t. < T, and 1 < u < U, let m^tu be 1 if u^t = u, and let m^tu be 0 otherwise. For 
example, in Section 1, ni\tt = rri 2 t{t+i) = 0 for 1 < t < T and rriktu = 0 if u is not t + k — 1. In 
Section 1.1, m^tk = 1 for 1 < k < K and 1 < t < T and rriktu = 0 if k ^ u. 

A number of restraints on the niktu necessarily exist. Only one form is administered at each 
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administration to each group. Thus for each Administration t and Group k. 

K 

^ ' Wlktu — 1 • 
k= 1 

The number of Groups k receiving Form u at Administration t is 

K 

m , +tu Y, m ktu- 

k= 1 

In Sections 1 and 1.1, this number is 0 or 1, but equating designs may be considered in which 
m+tu can exceed 1. For Administration t, the number of groups is K, so that the sum 



u 

y, m + tu = K. 

U= 1 



(1) 



The sum 



T 

'kri'++u — ^ ' vn+tu 
t = l 



is the total number of groups that receive Form u in some administration. Because KT groups 
are present in the T administrations, the summation 



u 

ym ++u = KT. 

U= 1 



(2) 



To develop least-squares equations requires consideration of instances in which two forms 
appear in the same administration. Let 



T 

Quu' b ^ ' 'm+tu^+tu 1 
t = l 



for 1 < u < U and 1 < v! < U. Note that m + t u ?n +iu / is the number of pairs ( k , k ') of groups, 

1 < k < K and 1 < k! < K, such that, at Administration t. Group k receives Form u and Group k! 
receives Form v! . Use of (1) shows that 



u 

— hu ^ ' Quu'- (3) 

U f = 1 



For Group k, 1 < k < K , in Administration t, 1 < t < T, let X^t be the average of the examinee 
scores X^ t for 1 < i < M. For Administration t, let X + t be the sum of the averages X^t f° r 
1 < k < K. For Form u, 1 < u < U, let X u be the sum of X^t for Administrations t and 
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Groups k such that Group k receives Form u ( Ukt = u). The estimates D u , 1 < u < U, satisfy the 
simultaneous equations 



u T 

1Tl-\ — \-U-k^U ^ ' Quu'^u' Xy T A ^ ' 'm~\-tuX-\-t 

U ' = 1 t= 1 



( 4 ) 



for 1 < u < U, and D\ = 0. The D u , 1 < u < U, have uniquely defined estimates if the m + tu > 

1 < t < T, l<u<U, satisfy the inseparability conditions (Goodman, 1968) that each Form u is 
used at least once in some Group k and Administration t ( m ++u > 0 for 1 < u < U) and no way 
exists to divide the U form numbers from 1 to U into two nonempty disjoint subsets A and B such 
that q uu i = 0 if u is in A and v! is in B. It will be assumed that the inseparability assumption 
holds. 

To examine the inseparability issue, first consider the equating design in Section 1. In this 
example, is 2 for 1 < u < U and m++ u = 1 for u equal 1 or U. Forms u and v! can only 

appear in the same administration if \u — u’\ < 1. It follows that q uu < = 0 if \u — u'\ > 1, q uu ’ = 1/2 
if | u — v! | = 1, q uu = 1 if 1 < u < U, and qn = quu = 1/2. Let t and t! be administration numbers 
for 1 < t < T and 1 < t' < T, and let u and v! be form numbers for 1 < u <U and 1 < v! < U. If 
A and B are disjoint nonempty subsets of the integers from 1 to U and if each integer from 1 to U 
is in either A or B, then some u and u' must exist such that u is in A, v! is in B, and \u — u'\ = 1. 
In such a case, q uu i > 0. It follows that the inseparability assumption holds. 

In Section 1, results are even simpler. Here = T > 0 for each Form u and 

Quu’ = T/U > 0 for any Forms u and u' . Because q uu i is always positive, the inseparability 
condition holds. 

To study equating accuracy, variances of the estimates D u are needed for Forms u > 1. For 
this purpose, the complete covariance matrix of the D u , 1 < u < U, is determined. This covariance 
matrix is determined in stages. To begin, consider the U by U symmetric matrix C with the 
element in row u and column u ' , 1 < u < U, 1 < u' < U, equal to 



C'uu' WI++U&UU 1 Quu' T 



(K - 1 )T 
U(U- 1)’ 



where the Kronecker function 5 uu > is 1 if u 



u' and 0 otherwise. Observe that (3) implies that 



u 

^ ' (?Tl-| — | -u&uu' Quu') — 0 

u '= 1 

for 1 < u < U. The inseparability condition implies that C is positive-definite and invertible. 
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By standard linear algebra, the matrix C has a decomposition into eigenvalues and 



eigenvectors such that 



a, 



u 

= X> 

V=1 



Wuv'Wu' V) 



where, for 1 < v < U, the eigenvalue A„ > 0, and the eigenvector w„ with elements w uv , 
1 < u < U, satisfies the orthogonality conditions 



u 

E 

U = 1 






1, v = v', 
0, w / v ' , 



for 1 < v 1 < v. For v = 1, Ai = (K — 1 )T/(U — 1), and w u \ = 1/C/ 1 / 2 . Standard linear algebra 
also implies that the inverse C _1 of C then has row u and column v! equal to 

u 

C uu ' = ^2 A ~ 1 w uv w u ' v . 

V=1 



To obtain the covariance matrix of the estimated conversion adjustments D u , 
differences 



u 



1 < u < U, the 



D U = D U - U - 1 J2 D u , 

u'= 1 

between the estimate D u and the average C/ _1 Y2u’=i D u ' are considered for 1 < u < U. Obviously, 
D u estimates D u — U _1 Y2u=i D u '- By the basic theory of estimable functions (Rao, 1973, pp. 
224-226), (<r 2 /M)C UU ' is the covariance of the estimates D u and D u >. Because D u = D u — D\, it 
follows that the variance of D u is 

2 2 ^ 

v\D u ) = G -(C UU - 2C ul + C 11 ) = a — x; a -\w uv ~ wiv) 2 . 

1 v=2 



Obviously, a 2 (D i) = 0. More generally, the variance of D u — D u i , u / u 1 , is 

2 2 U 

a\bu ~ b u .) = Ec~ - 2C UU ‘ + C") = ^E A »‘< 



'UJlLV 'Wu'v ) 



v=2 



The average variance of <j 2 (D u — D u >) is 



u 2 = 



2 o- 2 



M(U - 1 ) 






( 5 ) 



( 6 ) 
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This result is based on a classical relationship between mean squared differences and sample 
variances. For real numbers x u , 1 < u < U, 



u u 




r u 


( U \ T 


mu- i)r 2 EE(^ 


1 

to 

II 

to 

1 

i — 1 

l 


M 

e to 
1 




u= 1 u'= 1 




[n=l 


\tt=l / 



For v>l, J2u = i w uv = 0 and J2u=i w lv = 1 - Thus ( 5 ) anci ( 7 ) imply (6). 

A lower bound for a 2 is easily constructed by use of a classical inequality for the harmonic 
and arithmetic means (Hardy, Littlewood, & Polya, 1952, pp. 26-27). For any real numbers x v , 
2 < v < U, their harmonic mean 

^(t/- 

is never greater than their corresponding arithmetic mean 

u 

(u-iy^xv, 

v=2 

with equality if, and only if, the x v are all equal. Because the trace Ylu=i C U u of C is the sum of 
its eigenvalues (Halrnos, 1958, p. 105), 

u 

J2^v = U(K-1)T/(U-1). 

V=1 

Because Ai = ( K — 1 )T/(U — 1), it follows that 



u 



E a 



(I< - 1)T, 



so that 



_ 2 2 Ka\U - 

° ~ N(K - 1) 



with equality if, and only if, is constant for v > 1. The condition that A„ is constant for v > 1 
holds if, and only if, A„ = ( K — 1 )T/(U — 1) for 1 < v < U and C = [(K — 1 )T/(U — 1)]I, where I 
is the U by U identity matrix. When the lower bound on a 2 is achieved, 



a 2 (D u -D u >) = a 2 



for Forms u and u' ^ u. 

Observe that, given a fixed number N of examinees, a fixed score variance u 2 , and a fixed 
number K of forms per administration, the lower bound on a 2 is proportional to U — 1, the 
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number of forms minus 1. The square root of ex 2 , a measure of root mean squared error, is then 
proportional to ( U — l) 1 / 2 . This result does imply that more forms inevitably results in less 
accuracy, but the rate of increase does not directly involve the number of administrations. The 
key issue is that no constraint has been introduced on form reuse. 

The lower bound on a 2 is achievable. It applies under the conditions of section 1.1, for 
m++ u = T and q uu i = T/K for Forms u and u', where u and v! are positive integers no greater 
than U = I\. Thus C = TI, and 

ci 2 = 2a 2 U/N. 

The lower bound on a 2 is also achieved for the balanced incomplete block case with m++ u = K T /U, 
Quu = T/U , and q uu > = ( K — 1 )T/[U(U — 1)] for u / v! (Cochran & Cox, 1957, ch. 11). In this 
case, 

_2 _ 2I\o 2 (U - 1) 

W “ N(K- 1 ) ' 

Unfortunately, the practical constraints on the equating design of section 1.1 normally also 
apply to balanced incomplete blocks. In an equating design with balanced incomplete blocks, 
it is necessary that KT/U and (K — 1 )KT/[U(U — 1)] must both be integers. For a number 
T of administrations sufficiently large, this condition cannot hold if the security constraint is 
imposed that, for some integer Q > 2, the total number m+_|_ u of times Form u is used in some 
group for some administration satisfies the constraint to_|_+ u < Q for 1 < u < U. Thus KT jU 
cannot exceed Q and U must be at least KT/Q. Thus the lower bound on a 2 is then at least 
2K(KT — Q)a 2 /[Q(K — 1)1V], so that more administrations T leads to a higher average variance 
u 2 . 

Lower bounds can also be considered for variances of linear contrasts of the estimated 
adjustments D u for Forms u from 1 to U. These bounds can provide insight into commonly 
observed increases in variances of the D u as the form number u increases. Let b be the 
[/-dimensional vector with elements b u , 1 < u < u, where the sum of the b u is 0. Consider the 
variance of the estimate 

u u 

9 — ''y \ b u D u — ''y ( b u D u 

U = 1 U — 1 

of 

u 

9 = ^ ^ 

u= 1 
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Let 



u 

x'y = Y2 x uVu 

U= 1 

for any [/-dimensional vector x with elements x u for 1 < u < U and any [/-dimensional vector y 
with elements y u , 1 < u < U. Then 

y / C _1 yx'Cx > (x'y) 2 

(Rao, 1973, p. 54), with equality if 

y = cCx (8) 

for some real c. Note that (8) implies that 

yl y = c x Lx = cy x. 



The case of x = y = b shows that 






u 2 (b'b) 2 
Mb'Cb ’ 



where 



u u u 

b'Cb = £ m ++u b 2 u - EE bubu' Quu' • 
u= 1 u= 1 u '= 1 



By (3), it follows that 



U U 

b'Cb = 2- 1 2 2 IK 

u = 1 «'=1 



Equality holds if, and only if, for some real c, 



Quu' • 



U 

bu — C ^ ^ Quu' (fin ^it') 
w'=l 



for 1 < u < U, so that 

(T 2 (5) = a 2 cb'b/M. 



For example, if v and 1 / are distinct positive integers no greater than U. b v = 1. b v > = — 1, and 
b u = 0 for u neither equal to v nor v', then 

2 (D D > 4 Ka 2 

~ M[(K - 1 ){m ++v + m ++v >) + 2 Kq vv >] ' 

Equality holds only if q uv = q uv i for u neither v nor v 1 and m ++v = m ++v i . For example, 
in section 1.1, U = K and q uu / = T/U for Forms u and u'. Thus C is (T/U) I, and 
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a 2 ( D v — D v i ) = 2a 2 U/N. In general, if no form can be used more than Q times, so that < Q 

for 1 < u < U, then a 2 (D V — D v >) is at least 2 KTcr 2 /(NQ). 

For a more complex example often relevant to equating situations in which very old forms are 
not directly compared with new forms, consider the case of y = b for b v = 1, b v i = —1, and b u = 0 
for u neither v nor v' . For simplicity, let v 1 < v. Let x be defined so that x u = u — {v + v')/2 for 
1 < u < U. Then 



a\D v 



D„ 



> 



(v — v') 2 u 2 
Mx'Cx 



where 



u u u 

x'Cx = ^[tt - {U + l)/2 } 2 m ++u - EE [u-(U+l)/2}[u' ~(U+l)/2}q uu , 

U= 1 U=lu' = l 

U U 

= 2 u') 2 q uu ’. 

U= 1 u' = 1 

Equality holds only if, for some real c, 



U 

1 = c ^ (v - u)q vu ', 

u '= 1 
U 

0 = cj>- v!)q uu > 

u '= 1 

for u neither v nor v and 

U 

-1 =c^2(v' - v!)q v > u >. 

u'= 1 

When equality holds, (J 2 (D v — D v i) = c(v — v’)a 2 /M . In section 1, K = 2, m ++ \ = m ++ u = 1, 
m ++u = 2 for 1 < u < U, T = U — 1, q uu i = 1/2 for | u — u'\ = 1 and q uu / = 0 for | u — v! \ > 1. If 
v = U and v 1 = 1, then equality holds with c = 2, so that 



a 2 {Du) = <j 2 (bu - £>i) = 2{U - l)a 2 /M = AT 2 a 2 /N , 



as expected from section 1 if one recalls that M = N / (2 T) in this case. 

One may interpret x'Cx for x u = u — (JJ + l)/2 in terms of a variance. Consider random 
variables Z\ and Z -2 with integer values from 1 to U. Let 

u 

w = kt-J2quu ■ 

u= 1 
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Let Z\ never equal A, and let the joint probability that Z\ = u and Z 2 = v! be q uu '/W for u / u! . 
Then 

x'Cx = Wo 2 (Z\ — Z 2 ) /2, 



so that 



(j 2 {b v - A.') > 



2(i i-v') 2 a 2 



(9) 



W M a 2 (Z \ - Z 2 )' 

If the restriction is imposed that, for some positive integer r, Forms u and v! never appear in the 
same administration if \u — u'\ > r, then \Z\ — Z 2 I < r with probability 1, so that 



u 2 (A - Z 2 ) < r 2 , 



with equality if, and only if, r 



1 (Habernran, 1996, p. 272), so that 
2(v - v') 2 a 2 



(j {D v - D v i)> 



WMr 2 



(10) 



In particular, 

In (11), equality holds for r 
standard formula 



v 2 {bu) > 



2(U — l) 2 a 2 



WMr 2 ’ 

1 under the conditions in section 1. In addition, (7) and the 



( 11 ) 



u 



U- 1 Y,[u -{U+ l)/2] 2 = {U + l){U- 1)/12 

U= 1 

(Stuart, 1950) leads to the inequality 



2> (U + l)Ua 2 
~ 3WMr 2 



(12) 



Observe that W < KT, so that W M is no greater than N. If no form is used with more than a 
single group in an administration, then W M = (K — 1 )N/K. The practical implication of (10), 
(11), and (12) is that, for fixed sample size N , forms K per administration, and positive integer r, 
the variance of equating adjustments increases very rapidly when the number of forms is large. 

To illustrate results, consider a case with 11 administrations and four forms per 
administration. Consider a total N of 110,00 examinees, and let the standard deviation a be 100. 
At Administration t, let Forms t to t + 3 be used. The standard deviations of the D u are then 
summarized in Table 1. For comparison, results are supplied for the approach of section 1 with 
the same number of examinees and the same number of administrations. The only case in which a 
form has a lower standard error of equating for two rather than four forms per administration is 
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Table 1 

Standard Error of Equating Adjustment for Mean Equating 



Form 

number 


Standard error 
2 forms 4 forms 


2 


2.00 


2.57 


3 


2.83 


2.51 


4 


3.46 


2.50 


5 


4.00 


2.72 


6 


4.47 


2.85 


7 


4.90 


2.99 


8 


5.29 


3.12 


9 


5.66 


3.25 


10 


6.00 


3.37 


11 


6.32 


3.49 


12 


6.63 


3.62 


13 




3.80 


14 




4.18 



for Form 2, and Form 2 is used by 10,000 examinees if there are two forms per administration and 
by 5,000 examinees if there are four forms per administration. The gains are particularly dramatic 
in the case of four forms per administration for higher form numbers. Nonetheless, it should be 
emphasized that the standard errors of the D u still increase as the form number u increases. In 
the case of four forms, the lower bound for o(Djj) from (11) is 2.13, a figure considerably lower 
than the actual value. The lower bound on a(Djj) based on (9) is 3.51. In the case of two forms 
per administration, the lower bound on a(Djj) based on (11) is equal to the observed value. 

Several caveats are needed concerning results in this section. In practice, due to the need to 
report test scores in a timely manner, scores D u typically must be estimated at Administration t 
by use of the means Xkt' for Group k used in Administration t! for tf < t. Bounds here permit use 
of all means X^t for Group k in an Administration t, t <T. 

Results apply most effectively in an ideal situation in which the score distribution for the 
Xkt is normal with mean and variance a 2 common to all forms and administrations. This 
assumption obviously does not apply exactly in commonly used educational tests. 

Interactions between group and administration can greatly increase variability. Consider the 
following model change. For Group k at Administration t, let e^t be a random variable with 
mean 0 and variance a 2 that represents a random interaction of group and administration. In 
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many typical cases, each group receives a different form, and e^t is really an interaction of form 
and administration. Let the e^t be independent, and let the X^t — ekt be independent random 
variables with mean m-t = at — D Ukt and variance a 2 . Then variances of the D u and D u — D u i, 
u 7 ^ u' , are multiplied by 1 + Ala 2 /a 2 . The practical effect of interaction is very large, for it 
becomes an increasingly large fraction of the variability in D u as the sample size M for a form 
used at an administration becomes increasingly large. 

Rather remarkably, the analysis for mean equating for a series of administrations with 
equivalent groups provides a general basis for discussion of equating errors. The following sections 
consider some of the many applications to other equating designs and other equating methods. 

2 Multiple Tests per Examinee 

In many equating designs, operational tests at different administrations are compared through 
internal or external anchor tests. Such equating designs can be described in terms of examinees 
who receive multiple tests. As in Section 1.2, consider a case with T > 1 administrations, K groups 
per administration, N examinees, and M = N/(KT ) examinees per group and administration. 
Let each examinee receive H > 1 different tests h, 1 < h < H. For example, one might have H = 2 
and have Test 1 be an operational test and Test 2 be an external anchor test. Other alternatives 
are possible. Test 2 might be an internal anchor test rather than an external anchor test. One 
might also have an operational test with H sections, with a score provided for each section. 

Whatever the interpretation of the tests, different forms are associated with different tests. 
For this purpose, test forms will be described by pairs of integers. Thus Test h can use Form (it, h ) 
for 1 < u < Uh, where Uh is a positive integer. At Administration t, for Test h. Group k receives 
Form (ukth,h), 1 < Ukth < Uh- For simplicity, let uuh = 1 for 1 < h < H. The base form for 
Test h will be Form (1, h). 

For a relatively simple example, consider a testing program in which Test 1 is an operational 
test and Test 2 is an external anchor test. Let all examinees at Administration t receive the same 
operational Form (f, 1), so that U\ = T. On the other hand, as in Section 1, let examinees in 
Administration t be divided randomly into K = 2 groups of equal size M = N/(2T). Let Group k , 
1 < fe < 2, receive Form (t + k — 1, 2). In this case, Ukti = t and Ukt 2 = t + k — 1. 

Let Examinee i, 1 < i < M, in Group k of Administration t have score X.ukh on Form ( Ukt.h , h). 
Let the score vectors with elements X^thi 1 < h < H, be independent. Let Xjj. t have mean 
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H kt with elements Hkthi 1 h < H, and positive-definite covariance matrix T with elements 7 /^/, 
1 < h < H, 1 < h < H. For simplicity, let the covariance matrix T be known. As in Section 1.2, 
an additive model for the means Hkth is employed. For each Test h. for some form parameters 
D u j j, 1 < u < Uh and administration parameters a t h , 1 < t < T, it is assumed that 

l^kth — (%th Dukthh (13) 

for 1 < k < K and 1 < t < T. To identify parameters, it is assumed that D\h = 0 for 1 < h < H , 
so that the adminstration parameter for Administration 1 and Test h is ot\h = H\ ih- The difference 
Pth = a t.h ~ ot-ih provides a measure of the proficiency on Test h of examinees at Administration t 
relative to examinees at Administration 1, while D u h measures the difficulty of Form (tt, h ) relative 
to Form (1 ,h). If the D u h are known, then conversions are easily accomplished. A score x on 
Test h on Form ( u , h ) is converted to a score x + D u h on Form ( 1 , h) . The simplifying assumption 
is made that the differences Pth are proportional in the sense that 

Pth = VhPih, (14) 

where the Vh. are known constants and v\ = 1. The vector u has elements Vh for 1 < h < H. It is 
often the case that = ( 7 / 1 / 1 / 711 ) 1//2 , so that the differences p t h are proportional to the standard 
deviations of the X,;/^/, . 

Estimation of parameters is typically somewhat more complex than in Section 1.2, although 
many equating designs lead to simplified computations. Under the assumption that the covariance 
matrix T is known, all remaining model parameters can be estimated by weighted least squares, 
but analysis is a bit more complicated in general than in Section 1.2. Linkage involves both forms 
administered to the same examinee and forms administered to different examinees in the same or 
different administrations. For example, in the example with an operational test and an external 
anchor in which the operational test is different for each administration, the operational tests are 
linked only through the external anchors. 

To describe the general use of weighted least squares, for Group k, Test h, Form ( u,h ), and 
Administration t, let mktuh be 1 if Ukth = u and 0 otherwise, and let m f. tu h be the vector with 
elements m ktu h'hh' for 1 < h' < H. Let m, +tu h = J2k = 1 mktuh , let and let 

B uhu 'h< = m ' ktu h F - 1 m ktu' h' - V’ _1 (m' A;tufe r-V)(m / A . tu , fe , r_1 i/) 
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and 



Cuhu'h' — B u hu'h' TT /TT -,\$h h' • 



U h (U h - 1) 



Assume that the identifiability condition holds that 



H u h , 

^ \ ''y \ ( 'll h u/ h ’ X u f h' o 

h'= 1«'=1 



for 1 < u < Uh and 1 < h < H only if x u h = 0 for 1 < u < Uh and 1 < h < H. Let X^ be the 
average of the score vectors for 1 < i < M. and let X + ^ be the sum of the X&t for 1 < k < K. 
One then minimizes the weighted sum of squares 

T K 

t = i k = i 

under the constraint that (13) and (14) hold and D\h = 0 for 1 < h < H . A similar argument 
to that in Section 1.2 shows that the weighted least squares estimates D u h of D u h satisfy the 
equations 

H u h , t K 

XX B uhu' h' D u ' h' — E E rnktuhT-^x+kt - ^(x'+^r-VH 

h ’= 1 u ’= 1 t = 1 k= 1 

for 1 < u < Uh and 1 < h < H, where D\h = 0 for 1 < h < H. A score x on Form ( u , h) is 
then converted to score x + D u h on Form (1, h). Variances can be computed as in weighted linear 
regression. 

A covariance matrix for the D u h may be computed as in section 1.2, although results are 
a bit more complex. To facilitate use of matrices, consider the index variables 7 r(u, 1) = u for 
1 < u < Ui and 7 r(u, h) = n(Uh, h — 1) + u for 1 < u < Uh and 1 < h < H. Let U = tt(Uh , H) be 
the sum of the Uh for 1 < j < H . If 

u h 

D u h D u h U ^ ^ ( D u >h, 

u '= 1 

then the covariance C uhu ' h ' of D u h and D u 'h'i 1 < u < Uh, 1 < h < H, 1 < u 1 < Uh 1 , 1 < h' < H, 
is row 7 t(u, h) and column 7 r(it / , h') of the inverse of the U by U matrix C with row n(u, h ) and 
column tt(u' , h') equal to Cuhu'h'- 

In typical applications which involve anchor tests, actual computations are much simpler 
than for the general case. Consider the following situation. There are two tests per examinee, so 
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that H = 2 . Test 1 is an operational test and Test 2 is an anchor test. The m+tu2, 1 < / < T, 

1 < u < U2, for Test 2 satisfy the inseparability requirement for a single test. In the case of 
Test 1 , only one form is used in an administration, and all examinees for the examination use 
that form. Thus rn^tti = 1 for 1 < k < K , 1 < t < T, U\ = T, and m^ux = 0 if 1 < k < K , 

1 < t. < T, 1 < u < U, and t / u. Thus the rriktu 1 do not satisfy the inseparability conditions. 
Nonetheless, for each Group k and Administration t. Xkti and Xkt2 have correlation 712/(711722), 
so that estimation of D u 2 for 1 < u < U2 is affected to some extent by the operational test 
results Xkti- The estimates D u 2 may be obtained as in section 1.2 from the observed differences 
Xikt2 — (712/711) Am, 1 < i < M, 1 < k < K , 1 < t < T. In the computations leading to ( 4 ), 

U is replaced by C/2, rriktu is replaced by rriktu 2, D u is replaced by D u 2, and Xkt is replaced by 
Xkt2 ~ (712/711)^*1- After some algebraic manipulation, one finds that the estimate D t \ of D t \ is 



Dn = K~ l 



u 2 

v 2 1 {X + t 2 — A^+12) — (X +t i — -X+112) + y^ j (m + tu 2 — rn + i U 2 )D U 2 . 

u= 1 



The expectation of the equating adjustment Dt\ for Test 1 at Administration t (Form (t, 1 )) is 
Df i . To find the variance of Dt 1, let C2 be the U2 by C/2 matrix with row u and column v! equal to 



a 



uu'2 



Tbl-\ — \-u2 &uu' 



(K - 1 )T 

Quu ' 2 + C/ 2 (C/ 2 - 1 ) 



where m ++u 2 is the sum of the m + th 2 for 1 < t < T and q uu ' 2 = K 1 Ylt=i m +tu2m+tu'2- Let the 
inverse C ^ 1 of C2 have row u and column u ' equal to ■ Then 



ff 2 (Ai) 



2T71 1(1^2 - 712/711) 2 

Nvl 

u 2 u 2 

2 + K ~ l EE ( Tfl-\-tu2 ^+ 1 ^ 2 ) ('ITl-\-tu , 2 

u=l u'=l 



T{ 722-712/711) 



m +lu , 2 )Cf 



For comparison of Administrations t and t' , t 7/ t' , note that Du — D t / 1 has mean Dti — D t i 1 and 
variance 



< 7 2 (A 1 — A'l) 



2 T 7 n(i /2 - 712/711) 2 

Nui 



T (722 - 712/711) 

Nvl 



u 2 u 2 

2 + K^ 1 ^ ^2 (w+t„2 - m +t ' U 2)(m +tu ’ 2 

u= 1 u '= 1 



Wl+t'u' 2)^; 



uu 

2 



For fixed sample size N, an increase in the number T of administrations obviously leads to 
increased variance; however, in typical situations with a large number C/ 2 of anchor forms, the most 
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serious problem involves the contribution to variance due to the large number of anchor forms 
rather than the large number of administrations. For example, as in Table 1, consider T = 11 
administrations and C /2 = 12 anchor forms, where K = 2, N = 110,000, and Form ( t , 1) and 
Form (t + 1,1) are used in Administration t . Observe that M = 5,000. Let 711 = 722 = 10,000, 
let 712 = 7, 500, and let 1/2 = 1 . In this example, 

U — 1 

Du2 = it2 - X 2t2 ) - 0.75(A m - X 2tl )} 

t = 1 

if2<tt<T+l and 

Dtl = 2 l [X +t 2 — X + i2 — X +t \ + X + \2 + (-0*2 + 0( t+ i) 2 — £>22)\- 

It follows after some calculation that 

a 2 (D t i) = l + 7[l + 4(t-2)]/32. 

For example, a(Dxi) = 3.02 is much larger than a(D 2 i) = 1.10. 

3 Linear Equating 

In linear equating, both means and standard deviations are employed. Linear equating is 
most appropriate for observed scores with normal distributions. Consider the following variation 
on the model in Section 1.2. At each Administration t, 1 < t < T, examinees are divided into 
K > 2 groups of M examinees, so that there are a total of N = KTM examinees in the T 
administrations. Forms 1 to U are to be linked, where U > 2, and Group k receives Form u^t 
at Administration t. There are K > 2 distinct forms used. The raw score X ^ of Examinee i 
from Group k at Administration t is a random variable with mean /ikt and variance a\ v and the 
reliability coefficient is p 2 for Form Ukt and Administration t. The X^ are assumed to be mutually 
independent. If u^t = u. 1 < u < U, then m^tu = 1- Otherwise, rriktu = 0. The definitions of the 
sums m + kt and m ++u are then as in Section 1.2. For some real a*, 1 < t < T, D u , 1 < u < U, 
rt > 0, 1 < t < T, and ( u > 0, 1 < u < U, it is assumed that 

Ukt = ( a t ~ X>u kt )/ Cu kt 

and 

a kt — T t I Cu kt - 
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To identify parameters, it is assumed that (j = 1 and D\ = 0. For convenience, it is assumed 
that un = 1, so that a\ is the mean score at Administration 1 on Form 1 and an = 7i is the 
corresponding standard deviation. Observe that for Forms u and v ! , in any Administration t such 
that, for Groups k and k', Ukt = u and uyt = u ’ , then 

&kt/&k't = Cu/ Cu (15) 

and 

Cu Hkt T D u — Cu 1 l^k't T D u > . (lb) 

In linear equating, a score of x on Form u is converted to a score of e u (x) = CuX + D u on 
Form 1. Thus linear equating reduces to mean equating if all Cu are equal to 1. More generally, this 
conversion rule implies that a score of x on Form u corresponds to a score of £“/( C u x + D u — D u > ) 
on Form v! . This conversion is consistent with (15) and (16). These equation correspond with 
customary requirements for chained equating. 

If the Xikt are normally distributed and if the inseparability requirement of Section 1.2 is 
satisfied, then the at, Cu, Du, and r* may be estimated by use of maximum likelihood. Let hats 
be used to denote maximum-likelihood estimates, so that at is the maximum-likelihood estimate 
of at, Cu is the maximum-likelihood estimate of Cu, D u is the maximum-likelihood estimate of 
D u , Tt is the maximum-likelihood estimate of Tt, and e u {x) is the maximum-likelihood estimate of 
e u {x). Standard large-sample approximations for maximum-likelihood estimates can be applied 
with little complication to provide normal approximations for all maximum-likelihood estimates of 
interest under the condition that M becomes large. Although results simplify somewhat because 
Xikt ~ IMkt is uncorrelated with (X^t ~ Mifci) 2 under the normality assumption, the asymptotic 
variances and covariances of parameter estimates are somewhat more complex than in mean 
equating except in special cases. Normal approximations can be expressed in terms of a regression 
model. Let xt = log n, ±t = log ft, u u = logC«, and Co u = logC„. Let = (a t - D Ukt )/r t for 
1 < k < K and 1 < t < T. Consider a hypothetical linear regression model in which 

Ykt ~ 2" 1/2 (xt - Wu w ) 

and 

Ykt ~ T t ( a t ~ D Ukt ) — Ipkt^Ukt 
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are independent normal random variables with common mean 0 and variance M -1 for 1 < k < K 
and 1 < t < T. In this model, xt, at, and D u are treated as unknown parameters to be 
estimated, while r* and V’fct are treated as known. (The relationship of Tt to xt and the relationship 
of i pkt to Xt, at, and D Ukt is ignored.) The restrictions are imposed that D\ = uq = 0. Under 
the inseparability assumption, the least-squares estimates Xt °f Xt, w u °f U u, a% of at, and D* of 
D u are uniquely defined, unbiased, and normal distributed with variances and covariances readily 
found as in standard regression analysis. The joint distribution of the estimates at, 1 < f < f, D u , 
1 < u < U , xt, 1 < t < T, and u> u , 1 < u < U, is approximately the same as the joint distribution 
of the hypothetical estimates a*, 1 < t < t, D*, 1 < u < U, Xt , 1 5s t < T, and ut*, 1 < u < U. 
For fixed number K of groups per administration and fixed number T of administrations, the 
approximation is increasingly accurate as the sample size M per group within administration 
becomes increasingly large. The estimate £ u is approximately distributed as £ u (l + lo*), so that 
e u (x) is approximately distributed as ^ u (l + u*) + D*. 

If the model for mean equating holds, then T( = t\, Xt = logri, lo u = 0, t pkt. = (at — D Ukt )/ t\, 
and ( u = 1, so that e u (x) = x + D u . In this case, linear equating leads to less satisfactory results 
than does mean equating. The basic argument involves a general observation concerning regression 
analysis. Consider a linear regression model of the form Y = X/3 + e, where Y is a random vector 
with n elements, X is a fixed n by p matrix of rank p for some positive integer p. (3 is an unknown 
fixed vector with p elements, and e is a random vector with n independent elements, each of which 
has mean 0 and variance a 2 > 0. As is well known, (3 has least-squares estimate b = (X , X) _1 X , Y 
with mean (3 and covariance matrix cr 2 (X / X) _1 . On the other hand, if for some positive integer 
q < p, (3j = 0 for q < j < p, then one can consider the use of least squares subject to the restriction 
that f3j = 0 for q < j < p. In this case, a new least-squares estimate b* is obtained. The elements 
b* = 0 of b are 0 for q < j < p. If Z is the n by q matrix formed from the first q columns of X, 
and if b _ is the g-dimensional vector with elements b* for 1 < j < q, then b“ = (Z / Z) _1 Z / Y. If x 
is a p-dimensional vector with elements Xj for 1 < j < p, some Xj is not 0, and z is a ^-dimensional 
vector with elements Xj for 1 < j < q, then x r b has variance cr 2 x / (X / X) _1 x, while x ; b* = z'h~ 
has variance cr 2 z / (Z / Z) _1 z. By the Gauss-Markov theorem (Rao, 1973, ch. 4), 

x / (X , X) _1 x > z\Z’Z)- l z. 

Both x ; b and x'b* have mean x'/3. These results apply to linear equating by consideration of 
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the case in which iv u is assumed 0 and xt is assumed constant. It follows that the approximate 
variance a 2 (e u (x)) exceeds the variance of D u from linear equating for each Form u > 1. A 
practical implication of the result is that cautions concerning equating of many forms that were 
developed under mean equating must also apply in the case of linear equating. 

An added and more general lower bound for variances for normal approximations associated 
with linear equating can be obtained by consideration of the case of u u known. In this case, 
similar arguments to those for uj u equal 0 show that variances are very similar to those associated 
with mean equating. The normal approximation for e u {x ) has a variance at least as large as the 
variance obtained for D u in section 1.2 for a 2 equal to the smallest value of t 2 , 1 <t<T. 

The linear equating arguments used here have a simple application to item response 
theory. Suppose that the in the model for linear equating are latent variables with normal 
distributions, so that they correspond to conventional 0-parameters. Let each Form u have r 
dichotomous items, and let the observed response on Item j for Examinee i from Group k at 
Administration t. be Yjt kt equal to 0 or 1. Let the Yjt kt , l<J<r,r>3, be conditionally 
independent given the Xi kt . Let the conditional probability that Yjt kt = 1 given X lkt = x be 

exp(7 jkt.x - Pjkt) 

1 + exp(7 jkt x - P jkt ) 

for some unknown constants 7 j kt > 0 and (3jkt- If the added restriction is imposed that 17 = 1 and 
07 = 0, then all 7 j k t and /3j k t can be estimated by marginal maximum likelihood, together with 
at, D u , Tt, and £ u . Normal approximations for maximum-likelihood estimates are readily derived, 
but results are relatively complicated. Nonetheless, a rather trivial lower bound can be obtained 
for the variances of normal approximations for the maximum-likelihood estimate e u {x ) of e u {x). 
The variance of the normal approximation for e u (x ) for the item response model is at least as 
great as the variance of the normal approximation for e u (x) which is obtained under the ordinary 
case of linear equating in which the X ikt are directly observed (Sundberg, 1974). 

The arguments just used also apply if each Group k at Administration t has a distinct form 
but nonempty subsets V k t of the integers 1 to r exist for each Group k and Administration t such 
that, if u kt = u k 't', then V kt = V k > t > and, for j in V kt , j jkt = 7^.7/ and 7^ = 7 jk > t >. Assume 
that Xi kt has mean at — D Ukt and standard deviation T t /Cu kt , and retain the assumption that 
77 = 1 and a\ = 0. Then a value x for Xi kt for Group k and Administration t is adjusted to 
g u (x) = C u x + D u for Group 1 at Administration 1 if u k t = u. Lower bounds of variances for 
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normal approximations for maximum-likelihood estimate of e u {x) are found for the case of linear 
equating in which the are known. 



4 Conclusion 

The analysis provided has some strong implications in practice. Accuracy of equating 
methods is limited by sample size. For a given total sample N distributed over T administrations, 
limits on accuracy involve the number U of distinct forms employed. As the number U increases, 
the accuracy of equating results decreases. The decrease is especially severe if limits are placed on 
how long an older form can remain in use. The implications are important for programs in which 
a very large number of forms is used due to a very high frequency of administration and due to 
security concerns that limit reuse of forms. Under the assumption that the number of examinees 
per year is not materially affected by the frequency of administration, it is reasonable to expect 
that accuracy of equating will be much lower than in programs with comparable yearly volume 
in which few test forms are administered in a given year. As a consequence, the comparability of 
scores on different examinations may be compromised. Such an outcome can arise even if equating 
procedures perform perfectly and the only complication is sampling error. In the real world, in 
which equating procedures are not perfect, results can be substantially less satisfactory. 

Mitigation of the problems of equating error involves careful data collection; however, even 
the most careful data collection will have limitations if form reuse is severely restricted and the 
number of forms is very large. It is important to consider the number of forms which is sufficient 
so that inappropriate study of past forms has no realistic possibility of affecting an examinee 
score due to the limitations of human memory and due to the labor involved in such study. If the 
number of forms produced is sufficiently limited, then so is the problem of equating error. 

If reuse is not an option, then it may be necessary periodically to restart equating procedures 
with newer base forms. Such a procedure may be tolerable in cases in which test results can only 
be used for a limited period, say two years. 

Because of computer-based testing, the problem of frequent administration is likely to be a 
continuing issue. It is certainly advisable that new testing programs consider the implication of 
linking large numbers of forms prior to their first administration rather than afterwards. 



28 




References 



Cochran, W. G., & Cox, G. M. (1957). Experimental designs (2nd ed.). New York, NY: John 
Wiley. 

Goodman, L. A. (1968). The analysis of cross-classified data: independence, quasi-independence, 
and interactions in contingency tables with or without missing entries. Journal of the 
American Statistical Association, 63, 1091-1131. 

Haberman, S. J. (1996). Advanced statistics. Volume I: Description of populations. New York, 
NY: Springer. 

Halrnos, P. R. (1958). Finite- dimensional vector spaces (2nd ed.). Princeton, NJ: Van Nostrand. 
Hardy, G., Littlewood, J. E., & Polya, G. (1952). Inequalities (2nd ed.). Cambridge, Enlgand: 
Cambridge University Press. 

Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking: Methods and practices 
(2nd ed.). New York, NY: Springer. 

Rao, C. R. (1973). Linear statistical inference and its applications (2nd ed.). New York, NY: 

John Wiley. 

Scheffe, H. (1959). The analysis of variance. New York, NY: John Wiley. 

Stuart, A. (1950). The cumulants of the first n natural numbers. Biometrika, 37, 446. 

Sundberg, R. (1974). Maximum likelihood theory for incomplete data from an exponential family. 
Scandinavian Journal of Statistics, 1, 49-58. 

von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. 
New York, NY: Springer. 



29 




